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PEPTIDE AND PROTEIN FUSIONS TO THIOREDOXIN AND 
THIOREDOXIN-LIKE MOLECULES 

5 The present invention relates generally to the production 

of fusion proteins in prokaryotic and eukaryotic cells. More 
specifically , the invention relates to the expression in host 
cells of recombinant fusion sequences comprising thioredoxin or 
thioredoxin-like sequences fused to sequences for selected 
10 heterologous peptides or proteins, and the use of such fusion 
molecules to increase the production, activity, stability or 
solubility of recombinant proteins and peptides. 

Background of the Invention 

15 Many peptides and proteins can be produced via recombinant 

means in a variety of expression systems, e.g., various strains 
of bacterial, fungal, mammalian or insect cells. However, when 
bacteria are used as host ceils for heterologous gene expression, 
several problems frequently occur. 

20 For example, heterologous genes encoding small peptides are 

often poorly expressed in bacteria. Because of their size, most 
small peptides are unable to adopt stable, soluble conformations 
and are subject to intracellular degradation by proteases and 
peptidases present in the host cell. Those small peptides which 

25 do manage to accumulate when directly expressed in EL. coli or 
other bacterial hosts are usually found in the insoluble or 
"inclusion body" fraction, an occurrence which renders them 
almost useless for screening purposes in biological or 
biochemical assays. 

30 Moreover, even if small peptides are not produced in 

inclusion bodies, the production of small peptides by recombinant 
means as candidates for new drugs or enzyme inhibitors encounters 
further problems. Even small linear peptides can adopt an 
enormous number of potential structures due to their degrees of 

35 conformational freedom. Thus a small peptide can have the 
' desired ' amino-acid sequence and yet have very low activity in 
an assay because the 'active' peptide conformation is only one 
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of the many alternative structures adopted in free solution. 
This presents another difficulty encountered in producing small 
heterologous peptides recombinantly for effective research and 
therapeutic use. 

5 Inclusion body formation is also frequently observed when 

the genes for heterologous proteins are expressed in bacterial 
cells. These inclusion bodies usually require further 
manipulations in order to solubilize and refold the heterologous 
protein, with conditions determined empirically and with 

10 uncertainty in each case. 

If these additional procedures are not successful, little 
to no protein retaining bioactivity can be recovered from the 
host cells. Moreover, these additional processes are often 
technically difficult and prohibitively expensive for practical 

15 production of recombinant proteins for therapeutic, diagnostic 
or other research uses. 

To overcome these problems, the art has employed certain 
peptides or proteins as fusion "partners" with a desired 
heterologous peptide or protein to enable the recombinant 

20 expression and/or secretion of small peptides or larger proteins 
as fusion proteins in bacterial expression systems. Among such 
fusion partners are included lacZ and trpE fusion proteins, 
maltose-binding protein fusions, and glutathione-S-transf erase 
fusion proteins [See, generally, Current Protocols in Molecular 

25 Biology, Vol. 2, suppl. 10, publ. John Wiley and Sons, New York, 
NY, pp. 16.4.1-16.8.1 (1990); and Smith et al, Gene . 62:31-40 
(1988)]. As another example, U. S. Patent 4,801,536 describes 
the fusion of a bacterial f lagellin protein to a desired protein 
to enable the production of a heterologous gene in a bacterial 

30 cell and its secretion into the culture medium as a fusion 
protein. 

However, often fusions of desired peptides or proteins to 
other proteins (i.e., as fusion partners) at the amino- or 
carboxyl- termini of these fusion partner proteins have other 
35 potential disadvantages. Experience in E^. coli has shown that 
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a crucial factor in obtaining high levels of gene expression is 
the efficiency of translational initiation. Translational 
initiation in E± coli is very sensitive to the nucleotide 
sequence surrounding the initiating methionine codon of the 
5 desired heterologous peptide or protein sequence, although the 
rules governing this phenomenon are not clear. For this reason, 
fusions of sequences at the amino-terminus of many fusion partner 
proteins affects expression levels in an unpredictable manner. 
In addition there are numerous amino- and carboxy-peptidases in 

10 EU. coli which degrade amino- or carboxyl-terminal peptide 
extensions to fusion partner proteins so that a number of the 
known fusion partners have a low success rate for producing 
stable fusion proteins. 

The purification of proteins produced by recombinant 

15 expression systems is often a serious challenge. There is a 
continuing requirement for new and easier methods to produce 
homogeneous preparations of recombinant proteins, and yet a 
' number of the fusion partners currently used in the art possess 
no inherent properties that would facilitate the purification 

20 process. Therefore, in the art of recombinant expression 
systems, there remains a need for new compositions and processes 
for the production and purification of stable, soluble peptides 
and proteins for use in research, diagnostic and therapeutic 
applications. 

25 

Summary of the Invention 

In one aspect, the present invention provides a fusion 
sequence comprising a thioredoxin-like protein sequence fused to 
a selected heterologous peptide or protein. The peptide or 

30 protein may be fused to the amino terminus of the thioredoxin- 
like sequence, the carboxyl terminus of the thioredoxin-like 
sequence , or within the thioredoxin-like sequence (e.g., within 
the active-site loop of thioredoxin) . The fusion sequence 
according to this invention may optionally contain a linker 

35 peptide between the thioredoxin-like sequence and the selected 
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peptide or protein • This linker provides, where needed, a 
selected cleavage site or a stretch of amino acids capable of 
preventing steric hindrance between the thioredoxin-like molecule 
and the selected peptide or protein. 
5 As another aspect, the present invention provides a DNA 

molecule encoding the fusion sequence defined above in 
association with, and under the control of, an expression control 
sequence capable of directing the expression of the fusion 
protein in a desired host cell. 

10 Still a further aspect of the invention is a host cell 

transformed with, or having integrated into its genome, a DNA 
sequence comprising a thioredoxin-like DNA sequence fused to the 
DNA sequence of a selected heterologous peptide or protein. This 
fusion sequence is desirably under the control of an expression 

15 control sequence capable of directing the expression of a fusion 
protein in the cell. 

As yet another aspect, there is provided a novel method for 
increasing the expression of soluble recombinant proteins. The 
method includes culturing under suitable conditions the above- 

20 described host cell to produce the fusion protein. 

In one embodiment of this method, if the resulting fusion 
protein is cytoplasmic, the cell can be lysed by conventional 
means to obtain the soluble fusion protein. More preferably in 
the case of cytoplasmic fusion proteins, the method includes 

25 releasing the fusion protein from the host cell by applying 
osmotic shock or freeze/thaw treatments to the cell. In this 
case the fusion protein is selectively released from the interior 
of the cell via the zones of adhesion that exist between the 
inner and outer membranes of E*. coli . The fusion protein is then 

30 purified by conventional means. In still another embodiment, if 
a secretory leader is employed in the fusion protein construct, 
the fusion protein can be recovered from a periplasmic extract 
or from the cell culture medium. As yet a further step in the 
above methods, the desired protein can be cleaved from fusion 

35 with the thioredoxin-like protein by conventional means. 
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Other aspects and advantages of the present invention will 
be apparent upon consideration of the following detailed 
description of preferred embodiments thereof. 

5 summary of the Drawings 

Fig. 1 illustrates the DNA sequence of the expression 
plasmid pALtrxA/EK/ILllAPro-581 (SEQ ID NO: 13) and the amino acid 
sequence for the fusion protein therein (SEQ ID NO: 14) , described 
in Example 1. 

10 Fig. 2 illustrates the DNA sequence (SEQ ID NO: 15) and amino 

acid sequence (SEQ ID NO: 16) of the macrophage inhibitory 
protein-la (MlP-la) protein used in the construction of a 
thioredoxin fusion protein described in Example 3. 

Fig. 3 illustrates the DNA sequence (SEQ ID NO: 17) and amino 
15 acid sequence (SEQ ID NO: 18) of the bone morphogenetic protein-2 
(BMP-2) protein used in the construction of a thioredoxin fusion 
protein described in Example 4. 

Fig. 4 is a schematic drawing illustrating the insertion of 
an enterokinase cleavage site into the active-site loop of E± 
20 coli thioredoxin (trxA) described in Example 5. 

Fig. 5 is a schematic drawing illustrating random peptide 
insertions into the active-site loop of iL_ coli thioredoxin 
(trxA) described in Example 5. 

Fig. 6 illustrates the DNA sequence (SEQ ID NO: 19) and amino 
25 acid sequence (SEQ ID NO: 20) of the human interleukin-6 (IL6) 
protein used in the construction of a thioredoxin fusion protein 
described in Example 6. 

Fig. 7 illustrates the DNA sequence (SEQ ID NO: 23) and amino 
acid sequence (SEQ ID NO: 24) of the M-CSF protein used in the 
30 construction of a thioredoxin fusion protein described in Example 
7. 

Detailed Description of the Invention 

The methods and compositions of the present invention permit 
the production of large amounts of heterologous peptides or 
35 proteins in a stable , soluble form in certain host cells which 
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normally express limited amounts of such peptides or proteins. 
The present invention produces fusion proteins which retain the 
desirable characteristics of a thioredoxin-like protein (i.e. 
stability, solubility and a high level of expression) . The 
5 invention also allows a small peptide insert into an internal 
region of the thioredoxin-like sequence (e.g. the active site 
loop of thioredoxin) to be accessible on the surface of the 
molecule. These fusion proteins also permit a peptide or protein 
fused at the free ends of the thioredoxin-like protein to achieve 

10 its desired conformation. 

According to the present invention , the DNA sequence 
encoding a heterologous peptide or protein selected for 
expression in a recombinant system is desirably fused to a 
thioredoxin-like DNA sequence for expression in the host cell. 

15 A thioredoxin-like DNA sequence is defined herein as a DNA 
sequence encoding a protein or fragment of a protein 
characterized by an amino acid sequence having at least 30% 
homology with the amino acid sequence of E. coli thioredoxin (SEQ 
ID NO: 22). Alternatively , a thioredoxin-like DNA sequence is 

20 defined herein as a DNA sequence encoding a protein or fragment 
of a protein characterized by a having a three dimensional 
structure substantially similar to that of human or E^. coli 
thioredoxin (SEQ ID NO: 22) and by containing an active site 
loop. The DNA sequence of glutaredoxin is an example of a 

25 thioredoxin-like DNA sequence which encodes a protein that 
exhibits such substantial similarity in three-dimensional 
conformation and contains a Cys....Cys active site loop. The 
amino acid sequence of L coli thioredoxin is described in H. 
Eklund et al, EMBO J. , 3:1443-1449 (1984) . The three-dimensional 

30 structure of Ej. coli thioredoxin is depicted in Fig. 2 of A. 
Holmgren, J. Biol. Chem. . 264:13963-13966 (1989). Fig. 1 below 
nucleotides 2242-2568 contains a DNA sequence encoding the E^ 
coli thioredoxin protein [Lim et al, J. Bacterid. . 163.: 311-316 
(1985)] (SEQ ID NO: 21). A comparison of the three dimensional 

35 structures of E± coli thioredoxin and glutaredoxin is published 
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in Xia, Protein Science I ; 310-321 (1992) . These four publications 
are incorporated herein by reference for the purpose of providing 
information on thioredoxin-like proteins that is known to one of 
skill in the art. 

5 As the primary example of a thioredoxin-like protein useful 

in this invention, JL. coli thioredoxin (SEQ ID NO: 21 and SEQ ID 
NO: 22) has the following characteristics. JU. coli thioredoxin 
is a small protein, only 11.7 kD, and can be expressed to high 
levels (>10%, corresponding to a concentration of 15 uM if cells 

10 are lysed at 10 A^/ml) . The small size and capacity for high 
expression of the protein contributes to a high intracellular 
concentration. JL. coli thioredoxin is further characterized by 
a very stable, tight structure which can minimize the effects on 
overall structural stability caused by fusion to the desired 

15 peptide or proteins. 

The three dimensional structure of coli thioredoxin is 
known and contains several surface loops, including a unique 
Cys....Cys active site loop between residues Cys^ and Cys M which 
protrudes from the body of the protein. This Cys....Cys active 

20 site loop is an identifiable, accessible surface loop region and 
is not involved in any interactions with the rest of the protein 
that contribute to overall structural stability. It is therefore 
a good candidate as a site for peptide insertions. Both the 
amino- and carboxyl-termini of E&. coli thioredoxin are on the 

25 surface of the protein, and are readily accessible for fusions. 
Human thioredoxin, glutaredoxin and other thioredoxin-like 
molucules also contain this Cys....Cys active site loop. 

E. coli thioredoxin is also stable to proteases. Thus, E. 
coli thioredoxin may be desirable for use in E. coli expression 

30 systems, because as an JjU. coli protein it is characterized by 
stability to E. coli proteases. JL. coli thioredoxin is also 
stable to heat up to 80 °C and to low pH. Other thioredoxin- 
like proteins encoded by thioredoxin-like DNA sequences useful 
in this invention share the homologous amino acid sequences, and 

35 similar physical and structural characteristics. Thus, DNA 
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sequences encoding other thioredoxin-like proteins may be used 
in place of E± coli thioredoxin (SEQ ID NO: 21 and SEQ ID NO: 22) 
according to this invention • For example , the DNA sequence 
encoding other species' thioredoxin, e.g., human thioredoxin, 
5 have been employed by these inventors in the compositions and 
methods of this invention. Human thioredoxin has a three- 
dimensional structure that is virtually superimposible on £^ 
coli's three-dimensional structure, as determined by comparing 
the NMR structures of the two molecules. Human thioredoxin also 

10 contains an active site loop structurally and functionally 
equivalent to the Cys....Cys active site loop found in the E. 
coli protein. Human IL-11 fused in frame to the carboxyl 
terminus of human thioredoxin (i.e., a human thioredoxin/ IL-11 
fusion) exhibited the same expression characteristics as the 

15 coli thioredoxin/ IL-11 fusion exemplified in Examples 1-2. 
Consequently, human thioredoxin is a thioredoxin-like molecule 
and can be used in place of or in addition to E. coli 
thioredoxin in the production of protein and small peptides in 
accordance with the method of this invention. Insertions into 

20 the human thioredoxin active site loop and on the amino terminus 
may be as well tolerated as those in £L. coli thioredoxin. 

Other thioredoxin-like sequences which may be employed in 
this invention include all or portions of the protein 
glutaredoxin and various species' homologs thereof [A. Holmgren, 

25 cited above] . Although E^ coli glutaredoxin and IjL. coli 
thioredoxin share less than 20% amino acid homology, the two 
proteins do have conformational and functional similarities 
[Eklund et al, EMBO J, , 3:1443-1449 (1984)] and glutaredoxin 
contains an active site loop structurally and functionally 

30 equivalent to the Cys....Cys active site loop of E. coli 
thioredoxin. Glutaredoxin is therefore a thioredoxin-like 
molecule as herein defined. 

The DNA sequence encoding protein disulfide isomerase (PDI) , 
or that portion thereof containing the thioredoxin-like domain, 

35 and its various species' homologs [J. E. Edman et al, Nature . 
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312:267-270 (1985)] may also be employed as a thioredoxin-like 
DNA sequence, since a repeated domain of PDI shares >30% homology 
with JLs. coli thioredoxin and that repeated domain exhibits a 
three-dimensional structure substantially similar to that of 
5 coli thioredoxin and contains an active site loop structurally 
and functionally equivalent to the Cys....Cys active site loop 
of E. coli thioredoxin. These two publications are incorporated 
herein by reference for the purpose of providing information on 
glutaredoxin and PDI which is known and available to one of skill 

10 in the art. 

Similarly the DNA sequence encoding phosphoinositide- 
specific phospholipase C (PI-PLC) , fragments thereof and various 
species' homologs thereof [C. F. Bennett et al, Nature P 334 :268- 
270 (1988)] may also be employed in the present invention as a 

15 thioredoxin-like sequence based on their amino acid sequence 
homology with JSa. coli thioredoxin, or alternatively based on 
similarity in three-dimensional conformation and the presence of 
an active site loop structurally and functionally equivalent to 
the Cys....Cys active site loop of E. coli thioredoxin. All or 

20 a portion of the DNA sequence encoding an endoplasmic reticulum 
protein, such as ERp72, or various species homologs thereof are 
also included as thioredoxin-like DNA sequences for the purposes 
of this invention [R. A. Mazzarella et al, J, Biol. Chem. . 
265 :1094-1101 (1990)] based on amino acid sequence homology, or 

25 alternatively based on similarity in three-dimensional 
conformation and the presence of an active site loop structurally 
and functionally equivalent to the Cys....Cys active site loop 
of E^ coli thioredoxin. Another thioredoxin-like sequence is a 
DNA sequence which encodes all or a portion of an adult T-cell 

30 leukemia-derived factor (ADF) or other species homologs thereof 
[N. Wakasugi et al, Proc. &atl. Acad, Sci.. USA . 87:8282-8286 
(1990)]. ADF is now believed to be human thioredoxin. These 
three publications are incorporated herein by reference for the 
purpose of providing information on PI-PLC, ERp72, and ADF which 

35 are known and available to one of skill in the art. 


WO 94/02502 


PCT/US93/06913 


10 

It is expected from the definition of thioredoxin-like DNA 
sequence used above that other sequences not specifically 
identified above, or perhaps not yet identified or published , may 
be thioredoxin-like sequences either based on the 30% amino acid 
5 sequence homology to EL. coli thioredoxin or based on having 
three-dimensional structures substantially similar to coli or 
human thioredoxin and having an active site loop functionally and 
structurally equivalent to the Cys....Cys active site loop of E. 
coli thioredoxin. One skilled in the art can determine whether 

10 a molecule has these latter two characteristics by comparing its 
three-dimensional structure , as analyzed for example by x-ray 
crystallography or 2 dimensional NMR spectroscopy, with the 
published three-dimensional structure for E±. coli thioredoxin and 
by analyzing the amino acid sequence of the molecule to determine 

15 whether it contains an active site loop that is structurally and 
functionally equivalent to the Cys....Cys active site loop of E. 
coli thioredoxin. By "substantially similar 11 in three- 
dimensional structure or conformation these inventors mean as 
similar to E. coli thioredoxin as is glutaredoxin. Based on the 

20 above description, one of skill in the art will be able to select 
and identify, or, if desired, modify, a thioredoxin-like DNA 
sequence for use in this invention without resort to undue 
experimentation. For example, simple point mutations made to 
portions of native thioredoxin or native thioredoxin-like 

25 sequences which do not effect the structure of the resulting 
molecule are alternative thioredoxin-like sequences, as are 
allelic variants of native thioredoxin or native thioredoxin-like 
sequences • 

DNA sequences which hybridize to the sequence for E. coli 
30 thioredoxin (SEQ ID NO: 21) or its structural homologs under 
either stringent or relaxed hybridization conditions also encode 
thioredoxin-like proteins for use in this invention. An example 
of one such stringent hybridization condition is hybridization 
at 4XSSC at 65°C, followed by a washing in 0.1XSSC at 65°C for 
35 an hour. Alternatively an exemplary stringent hybridization 
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condition is in 50% formamide, 4XSSC at 42°C. Examples of non- 
stringent hybridization conditions are 4XSSC at 50°C or 
hybridization with 30-40% formamide at 42 °C. The use of all such 
thioredoxin-like sequences are believed to be encompassed in this 
5 invention. 

Construction of a fusion sequence of the present invention , 
which comprises the DNA sequence of a selected peptide or protein 
and the DNA sequence of a thioredoxin-like sequence, employs 
conventional genetic engineering techniques [see, Sambrook et al, 

10 Molecular Cloning, A Laboratory Manual . , Cold Spring Harbor 
Laboratory, Cold Spring Harbor, New York (1989)]. Fusion 
sequences may be prepared in a number of different ways. For 
example, the selected heterologous protein may be fused to the 
amino terminus of the thioredoxin-like molecule. Alternatively, 

15 the selected protein sequence may be fused to the carboxyl 
terminus of the thioredoxin-like molecule. Small peptide 
sequences could also be fused to either of the above-mentioned 
positions of the thioredoxin-like sequence to produce them in a 
structurally unconstrained manner. 

20 This fusion of a desired heterologous peptide or protein to 

the thioredoxin-like protein increases the stability of the 
peptide or protein. At either the amino or carboxyl terminus, 
the desired heterologous peptide or protein is fused in such a 
manner that the fusion does not destabilize the native structure 

25 of either protein. Additionally, fusion to the soluble 
thioredoxin-like protein improves the solubility of the selected 
heterologous peptide or protein. 

It may be preferred for a variety of reasons that peptides 
be fused within the active site loop of the thioredoxin-like 

30 molecule. The face of thioredoxin surrounding the active site 
loop has evolved, in keeping with the protein's major function 
as a nonspecific protein disulfide oxido-reductase, to be able 
to interact with a wide variety of protein surfaces. The active 
site loop region is found between segments of strong secondary 

35 structure and offers many advantages for peptide fusions. A 
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small peptide inserted into the active-site loop of a 
thioredoxin-like protein is present in a region of the protein 
which is not involved in maintaining tertiary structure. 
Therefore the structure of such a fusion protein is stable. 
5 Previous work has shown that HL_ coli thioredoxin can be cleaved 
into two fragments at a position close to the active site loop, 
and yet the tertiary interactions stabilizing the protein remain. 

The active site loop of E. coli thioredoxin (SEQ ID NO: 22) 
10 has the sequence NH2. . . Cysja-Gly-Pro-Cys^. . . COOH. Fusing a 
selected peptide with a thioredoxin-like protein in the active 
loop portion of the protein constrains the peptide at both ends, 
reducing the degrees of conformational freedom of the peptide, 
and consequently reducing the number of alternative structures 
15 taken by the peptide. The inserted peptide is bound at each end 
by cysteine residues, which may form a disulfide linkage to each 
other as they do in native thioredoxin and further limit the 
conformational freedom of the inserted peptide. 

Moreover, this invention places the peptide on the surface 
20 of the thioredoxin-like protein. Thus the invention provides a 
distinct advantage for use of the peptides in screening for 
bioactive peptide conformations and other assays by presenting 
peptides inserted in the active site loop in this structural 
context. 

25 Additionally the fusion of a peptide into the loop protects 

it from the actions of coli amino- and carboxyl-peptidases. 
Further a restriction endonuclease cleavage site RsrII already 
exists in the portion of the E. coli thioredoxin DNA sequence 
(SEQ ID NO: 21) encoding the loop region at precisely the correct 

30 position for a peptide fusion [see Figure 4]. RsrII recognizes 
the DNA sequence CGG(A/T)CCG leaving a three nucleotide long 5'- 
protruding sticky end. DNA bearing the complementary sticky ends 
will therefore insert at this site in just one orientation. 

A fusion sequence of a thioredoxin-like sequence and a 

35 desired protein or peptide sequence according to this invention 
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may optionally contain a linker peptide inserted between the 
thioredoxin-like sequence and the selected heterologous peptide 
or protein. This linker sequence may encode, if desired, a 
polypeptide which is selectably cleavable or digestible by 
5 conventional chemical or enzymatic methods. For example, the, 
selected cleavage site may be an enzymatic cleavage site. 
Examples of enzymatic cleavage sites include sites for cleavage 
by a proteolytic enzyme, such as enterokinase, Factor Xa, 
trypsin, collagenase, and thrombin. Alternatively, the cleavage 

10 site in the linker may be a site capable of being cleaved upon 
exposure to a selected chemical, e.g., cyanogen bromide, 
hydroxy 1 amine, or low pH. 

Cleavage at the selected cleavage site enables separation 
of the heterologous protein or peptide from the thioredoxin 

15 fusion protein to yield the mature heterologous peptide or 
protein. The mature peptide or protein may then be obtained in 
purified form, free from any polypeptide fragment of the 
thioredoxin-like protein to which it was previously linked. The 
cleavage site, if inserted into a linker useful in the fusion 

20 sequences of this invention, does not limit this invention. Any 
desired cleavage site, of which many are known in the art, may 
be used for this purpose. 

The optional linker sequence of a fusion sequence of the 
present invention may serve a purpose other than the provision 

25 of a cleavage site. The linker may also be a simple amino acid 
sequence of a sufficient length to prevent any steric hindrance 
between the thioredoxin-like molecule and the selected 
heterologous peptide or protein. 

Whether or not such a linker sequence is necessary will 

30 depend upon the structural characteristics of the selected 
heterologous peptide or protein and whether or not the resulting 
fusion protein is useful without cleavage. For example, where 
the thioredoxin-like sequence is a human sequence, the fusion 
protein may itself be useful as a therapeutic or as a vaccine 

35 without cleavage of the selected protein or peptide therefrom. 
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Alternatively, where the mature protein sequence may be naturally 
cleaved, no linker may be needed. 

In one embodiment therefore, the fusion sequence of this 
invention contains a thioredoxin-like sequence fused directly at 
5 its amino or carboxyl terminal end to the sequence of the 
selected peptide or protein. The resulting fusion protein is 
thus a soluble cytoplasmic fusion protein. In another 
embodiment, the fusion sequence further comprises a linker 
sequence interposed between the thioredoxin-like sequence and the 

10 selected peptide or protein sequence. This fusion protein is 
also produced as a soluble cytoplasmic protein. Similarly, where 
the selected peptide sequence is inserted into the active site 
loop region or elsewhere within the thioredoxin-like sequence, 
a cytoplasmic fusion protein is produced. 

15 The cytoplasmic fusion protein can be purified by 

conventional means. Preferably, as a novel aspect of the present 
invention, several thioredoxin fusion proteins of this invention 
may be purified by exploiting an unusual property of thioredoxin. 
The cytoplasm of 1L. coli is effectively isolated from the 

20 external medium by a cell envelope comprising two membranes, 
inner and outer, separated from each other by a periplasmic space 
within which lies a rigid peptidoglycan cell wall. The 
peptidoglycan wall contributes both shape and strength to the 
cell. At certain locations in the cell envelope there are "gaps" 

25 (called variously Bayer patches, Bayer junctions or adhesion 
sites) in the peptidoglycan wall where the inner and outer 
membranes appear to meet and perhaps fuse together. See, M. E. 
Bayer, J. Bacteriol. . 23:1104-1112 (1967) and J. Gen. Microbiol. . 
11:395-404 (1968) . Most of the cellular thioredoxin lies loosely 

30 associated with the inner surface of the membrane at these 
adhesion sites and can be quantitatively expelled from the cell 
through these adhesion sites by a sudden osmotic shock or by a 
simple freeze/thaw procedure. See C. A. Lunn and V. P. Pigiet, 
J. Biol. Chem. . 257:11424-11430 (1982) and in "Thioredoxin and 

35 Glutaredoxin Systems: Structure and Function , pl65-176, (1986) 
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ed. A. Holmgren et al, Raven Press, New York. To a lesser extent 
some EF-Tu (elongation factor-Tu) can be expelled in the same way 
[Jacobson et al, Biochemistry . 15;2297-2302 (1976)], but, with 
the exception of the periplasmic contents, the vast majority of 
5 IjU. coli proteins cannot be released by these treatments. 

Although there have been reports of the release by osmotic 
shock of a limited number of heterologous proteins produced in 
the cytoplasm of JE^ coli [Denef le et al, Gene . j$5: 499-510 (1989) ; 
Joseph-Liauzun et al, Gene , 86:291-295 (1990) ; Rosenwasser et al, 

10 J, Biol, Chem. . 265:13066-13073 (1990)], the ability to be so 
released is a rare and desirable property not shared by the 
majority of heterologous proteins. Fusion of a selected, desired 
heterologous protein to thioredoxin as described by the present 
invention not only enhances its expression, solubility and 

15 stability as described above, but may also provide for its 
release from the cell by osmotic shock or freeze/thaw treatments, 
greatly simplifying its purification. The thioredoxin portion 
of the fusion protein in some cases, e.g., with MIP, directs the 
fusion protein towards the adhesion sites, from where it can be 

20 released to the exterior by these treatments. 

In another embodiment the present invention may employ 
another component, that is, a secretory leader sequence, among 
which many are known in the art, e.g. leader sequences of phoA, 
HBP, j3-lactamase, operatively linked in frame to the fusion 

25 protein of this invention to enable the expression and secretion 
of the mature fusion protein into the bacterial periplasmic space 
or culture medium. This leader sequence may be fused to the 
amino terminus of the thioredoxin-like molecule when the selected 
peptide or protein sequence is fused to the carboxyl terminus or 

30 to an internal site within the thioredoxin-like sequence. An 
optional linker could also be present when the peptide or protein 
is fused at the carboxyl terminus. It is expected that this 
fusion sequence construct when expressed in an appropriate host 
cell would be expressed as a secreted fusion protein rather than 

35 a cytoplasmic fusion protein. However stability, solubility and 
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high expression should characterize fusion proteins produced 
using any of these alternative embodiments., 

This invention is not limited to any specific type of 
peptide or protein. A wide variety of heterologous (i.e., 
5 foreign in reference to the host gemone) genes or gene fragments 
are useful in forming the fusion sequences of the present 
invention. Any selected, desired DNA sequence could be used. 
While the compositions and methods of this invention are most 
useful for peptides or proteins which are not expressed, 

10 expressed in inclusion bodies, or expressed in very small amounts 
in bacterial and yeast hosts, the heterologous, selected, desired 
peptides or proteins can include any peptide or protein useful 
for human or veterinary therapy, diagnostic or research 
applications in any expression system. For example, hormones, 

15 cytokines, growth or inhibitory factors, enzymes, modified or 
wholly synthetic proteins or peptides can be produced according 
to this invention in bacterial, yeast, mammalian or other 
eukaryotic cells and expression systems suitable therefor. 

In the examples below illustrating this invention, the 

20 proteins expressed by this invention include IL-11, MlP-la, IL~6, 
B3-CSF, a bone inductive factor called BMP-2, IL°2, IL~3, IL-4, 
IL-5, LIF, Steel Factor, MIF (macrophage inhibitory factor) and 
a variety of small peptides of random sequence. These proteins 
include examples of proteins which, when expressed without a 

25 thioredoxin fusion partner, are unstable in JL. coli or are found 
in inclusion bodies. 

A variety of DNA molecules incorporating the above-described 
fusion sequences may be constructed for expressing the selected 
peptide or protein according to this invention. At a minimum a 

30 desirable DNA sequence according to this invention comprises a 
fusion sequence described above, in association with, and under 
the control of, an expression control sequence capable of 
directing the expression of the fusion protein in a desired host 
cell. For example, where the host cell is an E. coli strain, the 

35 DNA molecule desirably contains a promoter which functions in EL. 
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coli, a ribosome binding site, and optionally, a selectable 
marker gene and an origin of replication if the DNA molecule is 
extra- chromosomal. Numerous bacterial expression vectors 
containing these components are known in the art for bacterial 
5 expression, and can easily be constructed by standard molecular 
biology techniques. Similarly known yeast and mammalian cell 
vectors and vector components may be utilized where the host cell 
is a yeast cell or a mammalian cell. 

The DNA molecules containing the fusion sequences may be 

10 further modified to contain different codons to optimize 
expression in the selected host cell, as is known in the art. 

These DNA molecules may additionally contain multiple copies 
of the thioredoxin-like DNA sequence, with the heterologous 
protein fused to only one of the DNA sequences, or with the 

15 heterologous protein fused to all copies of the thioredoxin-like 
sequence. It may also be possible to integrate a thioredoxin- 
like/heterologous peptide or protein-encoding fusion sequence 
into the chromosome of a selected host to either replace or 
duplicate a native thioredoxin-like sequence. 

20 Host cells suitable for the present invention are preferably 

bacterial cells. For example, the various strains of JLa. coli 
(e.g., HB101, W3110 and strains used in the following examples) 
are well-known as host cells in the field of biotechnology. E. 
coli strain 61724, used in the following examples, has been 

25 deposited with a United States microorganism depository as 
described in detail below. Various strains of L_ subtilis, 
Pseudomonas P and other bacteria may also be employed in this 
method. 

Many strains of yeast and other eukaryotic cells known to 
30 those skilled in the art may also be useful as host cells for 
expression of the polypeptides of the present invention. For 
example, Saccromyces cerevisia strain EGY-40 has been used by 
these inventors as a host cell in the production of various small 
peptide/thioredoxin fusions. It could be preferably used instead 
35 of E. coli as a host cell in the production of any of the 
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proteins exemplified herein. Similarly known mammalian cells 
may also be employed in the expression of these fusion proteins. 

To produce the fusion protein of this invention , the host 
cell is either transformed with, or has integrated into its 
5 genome, a DNA molecule comprising a thioredoxin-like DNA sequence 
fused to the DNA sequence of a selected heterologous peptide or 
protein, desirably under the control of an expression control 
sequence capable of directing the expression of a fusion protein. 
The host cell is then cultured under known conditions suitable 

10 for fusion protein production. If the fusion protein accumulates 
in the cytoplasm of the cell it may be released by conventional 
bacterial cell lysis techniques and purified by conventional 
procedures including selective precipitations, solubilizations 
and column chromatographic methods. If a secretory leader is 

15 incorporated into the fusion molecule substantial purification 
is achieved when the fusion protein is secreted into the 
periplasmic space or the growth medium. 

Alternatively, for cytoplasmic thioredoxin fusion proteins, 
a selective release from the cell may be achieved by osmotic 

20 shock or freeze/ thaw procedures. Although final purification is 
still required for most purposes, the initial purity of fusion 
proteins in preparations resulting from these procedures is 
superior to that obtained in conventional whole cell lysates, 
reducing the number of subsequent purification steps required to 

25 attain homogeneity. In a typical osmotic shock procedure, the 
packed cells containing the fusion protein are resuspended on ice 
in a buffer containing EDTA and having a high osmolarity, usually 
due to the inclusion of a solute, such as 20% w/v sucrose, in the 
buffer which cannot readily cross the cytoplasmic membrane. 

30 During a brief incubation on ice the cells plasmolyze as water 
leaves the cytoplasm down the osmotic gradient. The cells are 
then switched into a buffer of low osmolarity, and during the 
osmotic re-equilibration both the contents of the periplasm and 
proteins localized at the Bayer patches are released to the 

35 exterior. A simple centrifugation following this release removes 
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the majority of bacterial cell-derived contaminants from the 
fusion protein preparation. Alternatively, in a freeze/ thaw 
procedure the packed cells containing the fusion protein are 
first resuspended in a buffer containing EDTA and are then 
5 frozen. Fusion protein release is subsequently achieved by 
allowing the frozen cell suspension to thaw. The majority of 
contaminants can be removed as described above by a 
centrifugation step. The fusion protein is further purified by 
well-known conventional methods. 

10 These treatments typically release at least 30% of the 

fusion proteins without lysing the cell cultures. The success 
of these procedures in releasing significant amounts of several 
thioredoxin fusion proteins is surprising, since such techniques 
are not generally successful with a wide range of proteins. The 

15 ability of these fusion proteins to be substantially purified by 
such treatments, which are significantly simpler and less 
expensive than the purification methods required by other fusion 
protein systems, may provide the fusion proteins of the invention 
with a significant advantage over other systems which are used 

20 to produce proteins in E^. coli . 

The resulting fusion protein is stable and soluble, often 
with the heterologous peptide or protein retaining its 
bioactivity. The heterologous peptide or protein may optionally 
be separated from the thioredoxin-like protein by cleavage, as 

25 discussed above. 

In the specific and illustrative embodiments of the 
compositions and methods of this invention, the coli 
thioredoxin (trxA) gene (SEQ ID NO: 21) has been cloned and placed 
in an JL. coli expression system. An expression plasmid pALtrxA- 

30 781 was constructed. This plasmid containing modified IL-11 
fused to the thioredoxin sequence and called pALtr xA / EK/ IL1 1 a Pro- 
581 (SEQ ID NO: 13 and SEQ ID NO: 14) is described below in Example 
1 and in Fig. 1. A modified version of this plasmid containing 
a different ribosome binding site was employed in the other 

35 examples and is specifically described in Example 3. Other 
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conventional vectors may be employed in this invention. The 
invention is not limited to the plasmids described in these 
examples • 

Plasmid pALtrxA-781 (without the modified IL-11) directs the 
5 accumulation of >10% of the total cell protein as thioredoxin in 
E. coli host strain GI724. Examples 2 through 6 describe the use 
of this plasmid to form and express thioredoxin fusion proteins 
with BMP-2 (SEQ ID NO: 18) , IL6 (SEQ ID NO: 20) and MlP-la (SEQ ID 
NO: 16), which are polypeptides. 

10 As an example of the expression of small peptides inserted 

into the active-site loop, a derivative of pALtrxA-781 has been 
constructed in which a 13 amino-acid linker peptide sequence 
containing a cleavage site for the specific protease enterokinase 
[Leipnieks and Light, J. Biol, Chem. . 254:1077-1083 (1979)] has 

15 been fused into the active site loop of thioredoxin. This 
plasmid (pALtrxA-EK) directs the accumulation of >10% of the 
total cell protein as the fusion protein. The fusion protein is 
all soluble, indicating that it has probably adopted a 'native' 
tertiary structure. It is equally as stable as wild type 

20 thioredoxin to prolonged incubations at 80 °C, suggesting that the 
strong tertiary structure of thioredoxin has not been compromised 
by the insertion into the active site loop. The fusion protein 
is specifically cleaved by enterokinase, whereas thioredoxin is 
not, indicating that the peptide inserted into the active site 

25 loop is present on the surface of the fusion protein. 

As described in more detail in Example 12 below, fusions of 
small peptides (SEQ ID NO:l through SEQ ID NO: 12) were made into 
the active site loop of thioredoxin. The inserted peptides were 
14 residues long and were of totally random composition to test 

30 the ability of the system to deal with hydrophobic, hydrophilic 
and neutral sequences. 

The methods and compositions of this invention permit the 
production of proteins and peptides useful in research, 
diagnostic and therapeutic fields. The production of fusion 

35 proteins according to this invention has a number of advantages. 
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As one example, the production of a selected protein by the 
present invention as a carboxyl-terminal fusion to coli 
thioredoxin (SEQ ID NO: 21), or another thioredoxin-like protein, 
enables avoidance of translation initiation problems often 
5 encountered in the production of eukaryotic proteins in JL_ coli . 
Additionally the initiator methionine usually remaining on the 
amino-terminus of the heterologous protein is not present and 
does not have to be removed when the heterologous protein is made 
as a car boxy 1 terminal thioredoxin fusion • 

10 The production of fusion proteins according to this 

invention reliably improves solubility of desired heterologous 
proteins and enhances their stability to proteases in the 
expression system. This invention also enables high level 
expression of certain desirable therapeutic proteins, e.g., IL- 

15 11, which are otherwise produced at low levels in bacterial host 
cells. 

This invention may also confer heat stability to the fusion 
protein, especially if the heterologous protein itself is heat 
stable. Because thioredoxin, and presumably all thioredoxin-like 

20 proteins are heat stable up to 80 °C, the present invention may 
enable the use of a simple heat treatment as an initial effective 
purification step for some thioredoxin fusion proteins. 

In addition to providing high levels of the selected 
heterologous proteins or peptides upon cleavage from the fusion 

25 protein for therapeutic or other uses, the fusion proteins or 
fusion peptides of the present invention may themselves be useful 
as therapeutics provided the thioredoxin-like protein is not 
antigenic to the animal being treated. Further the thioredoxin- 
like fusion proteins may provide a vehicle for the delivery of 

30 bioactive peptides. As one example, human thioredoxin would not 
be antigenic in humans, and therefore a fusion protein of the 
present invention with human thioredoxin may be useful as a 
vehicle for delivering to humans the biologically active peptide 
to which it is fused. Because human thioredoxin is an 

35 intracellular protein, human thioredoxin fusion proteins may be 
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produced in an JjL. coli intracellular expression system. Thus 
this invention also provides a method for delivering biologically 
active peptides or proteins to a patient in the form of a fusion 
protein with an acceptable thioredoxin-like protein. 
5 The present invention also provides methods and reagents for 

screening libraries of random peptides for their potential enzyme 
inhibitory, hormone/ growth factor agonist and hormone/growth 
factor antagonist activity. Also provided are methods and 
reagents for the mapping of known protein sequences for regions 

10 of potential interest, including receptor binding sites, 
substrate binding sites, phosphorylat ion/modification sites, 
protease cleavage sites, and epitopes. 

Bacterial colonies expressing thioredoxin-like/random 
peptide fusion proteins may be screened using radiolabeled 

15 proteins such as hormones or growth factors as probes. Positives 
arising from this type of screen would identify mimics of 
receptor binding sites and may lead to the design of compounds 
with therapeutic uses. Bacterial colonies expressing 

thioredoxin-like random peptide fusion proteins may also be 

20 screened using antibodies raised against native, active hormones 
or growth factors. Positives arising from this type of screen 
could be mimics of surface epitopes present on the original 
antigen. Where such surface epitopes are responsible for 
receptor binding, the 'positive' fusion proteins would have 

25 biological activity. 

Additionally, the thioredoxin-like fusion proteins or fusion 
peptides of this invention may also be employed to develop 
monoclonal and polyclonal antibodies, or recombinant antibodies 
or chimeric antibodies, generated by known methods for 

30 diagnostic, purification or therapeutic use. Studies of 
thioredoxin-like molecules indicate a possible B cell/T cell 
growth factor activity [N. Wakasuki et al, cited above], which 
may enhance immune response. The fusion proteins or peptides of 
the present invention may be employed as antigens to elicit 
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desirable antibodies, which themselves may be further manipulated 
by known techniques into monoclonal or recombinant antibodies. 

Alternatively, antibodies elicited to thioredoxin- like 
sequences may also be useful in the purification of many 
5 different thioredoxin fusion proteins. The following examples 
illustrate embodiments of the present invention, but are not 
intended to limit the scope of the disclosure. 

EXAMPLE 1 - THI OREDOXIN/IL- 11 FUSION MOLECULE 

10 A thioredoxin-like fusion molecule of the present invention 

was prepared using E. coli thioredoxin as the thioredoxin-like 
sequence and recombinant IL-11 [Paul et al, Proc. Natl. Acad, 
Sci. U.S.A., 82:7512-7516 (1990); see also, copending United 
States Patent Applications SN 07/526,474, and SN 07/441,100 and 

15 PCT Patent publication WO91/0749, published May 30, 1991 
incorporated herein by reference] as the selected heterologous 
protein. The coli thioredoxin (trxA) gene (SEQ ID NO: 21) was 
cloned based on its published sequence and employed to construct 
various related E^ coli expression plasmids using standard DNA 

20 manipulation techniques, described extensively by Sambrook, 
Fritsch and Maniatis, Molecular Cloning. A Laboratory Manual . 2nd 
edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 
(1989) . 

A first expression plasmid pALtrxA-781 was constructed 
25 containing the E. coli trxA gene without fusion to another 
sequence. This plasmid further contained sequences which are 
described in detail below for the related IL-11 fusion plasmid. 
This first plasmid, which directs the accumulation of >10% of the 
total cell protein as thioredoxin in an E. coli host strain 
30 61724, was further manipulated as described below for the 
construction of a trxA/IL-11 fusion sequence. 

The entire sequence of the related plasmid expression 
vector, pALtrxA/EK/ILllAPro-581 (SEQ ID NO: 13 and SEQ ID NO: 14), 
is illustrated in Fig. 1 and contains the following principal 
35 features: 
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Nucleotides 1-2060 contain DNA sequences originating from 
the plasmid pUC-18 [Norrander et al, Gene . 26: 101-106 (1983)] 
including sequences containing the gene for ^-lactamase which 
confers resistance to the antibiotic ampicillin in host coli 
strains, and a colEl-derived origin of replication. Nucleotides 
2061-2221 contain DNA sequences for the major leftward promoter 
(pL) of bacteriophage X [Sanger et al, J. Mol. Biol. . 162:729-773 

(1982) ], including three operator sequences, 0 L 1, O v 2 and 0 L 3. 
The operators are the binding sites for Xcl repressor protein, 
intracellular levels of which control the amount of transcription 
initiation from pL. Nucleotides 2222-2241 contain a strong 
ribosome binding sequence derived from that of gene 10 of 
bacteriophage T7 [Dunn and Studier J. Mol. Biol. . 16JL: 477-535 

(1983) ]. 

Nucleotides 2242-2568 contain a DNA sequence encoding the 
E. coli thioredoxin protein (SEQ ID NO: 21) [Lim et al, Hm. 
Bacteriol. . 111:311-316 (1985)]. There is no translation 
termination codon at the end of the thioredoxin coding sequence 
in this plasmid. 

Nucleotides 2569-2583 contain DNA sequence encoding the 
amino acid sequence for a short, hydrophilic, flexible spacer 
peptide " — 6S6S6 — ". Nucleotides 2584-2598 provide DNA sequence 
encoding the amino acid sequence for the cleavage recognition 
site of enterokinase (EC 3.4.4.8), ■ — DDDDK — » [Maroux et al, i. 
Biol. Chem. . 246.: 5031-5039 (1971)]. 

Nucleotides 2599-3132 contain DNA sequence encoding the 
amino acid sequence of a modified form of mature human IL-11 
[Paul et al, Proc. Natl. Acad. Sci. USA . 87:7512-7516 (1990)], 
deleted for the N-terminal prolyl-residue normally found in the 
natural protein. The sequence includes a translation termination 
codon at the 3 '-end of the IL-11 sequence. 

Nucleotides 3133-3159 provide a "Linker" DNA sequence 
containing restriction endonuclease sites. Nucleotides 3160-3232 
provide a transcription termination sequence based on that of the 
£x coli aspA gene [Takagi et al, Nucl. Acids Res.. 13:2063-2074 
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(1985)]. Nucleotides 3233-3632 are DNA sequences derived from 
pUC-18. 

As described in Example 2 below, when cultured under the 
appropriate conditions in a suitable EU. coli host strain, this 
5 plasmid vector can direct the production of high levels 
(approximately 10% of the total cellular protein) of a 
thioredoxin/IL-ll fusion protein. By contrast, when not fused 
to thioredoxin, IL-11 accumulated to only 0.2% of the total 
cellular protein when expressed in an analogous host/vector 
10 system. 

EXAMPLE 2 - EXPRESSION OF A FUSIQlf PROTEjl^ 

A thioredoxin/IL-ll fusion protein was produced according 
to the following protocol using the plasmid constructed as 

15 described in Example 1. pALtrxA/EK/ILllAPro-581 (SEQ ID NO: 13) 
was transformed into the E^ coli host strain GI724 (F, lad *. 
lacP", ampC::XcI + ) by the procedure of Dagert and Ehrlich, Gene . 
j>: 23 (1979) . The untransf ormed host strain E. coli GI724 was 
deposited with the American Type Culture Collection, 12301 

20 Parklawn Drive, Rockville, Maryland on January 31, 1991 under 
ATCC No. 55151 for patent purposes pursuant to applicable laws 
and regulations. Transformants were selected on 1.5% w/v agar 
plates containing IMC medium, which is composed of M9 medium 
[Miller, "Experiments in Molecular Genetics", Cold Spring Harbor 

25 Laboratory, New York (1972)] supplemented with 0.5% w/v glucose, 
0.2% w/v casamino acids and 100 Mg/ml ampicillin. 

GI724 contains a copy of the wild-type Xcl repressor gene 
stably integrated into the chromosome at the ampc locus, where 
it has been placed under the transcriptional control of 

30 Salmonella t yphimurium trp promoter/ operator sequences. In 
GI724, Xcl protein is made only during growth in tryptophan-free 
media, such as minimal media or a minimal medium supplemented 
with casamino acids such as IMC, described above. Addition of 
tryptophan to a culture of GI724 will repress the trp promoter 

35 and turn off synthesis of Xcl, gradually causing the induction 
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of transcription from pL promoters if they are present in the 
cell* 

GI724 transformed with pALtrxA/EK/ILllAPro-581 (SEQ ID NO: 13 
and SEQ ID NO: 14) was grown at 37°C to an of 0.5 in IMC 
5 medium. Tryptophan was added to a final concentration of 100 
/xg/ml and the culture incubated for a further 4 hours. During 
this time thioredoxin/IL-11 fusion protein accumulated to 
approximately 10% of the total cell protein. 

All of the fusion protein was found to be in the soluble 

10 cellular fraction, and was purified as follows. Cells were lysed 
in a french pressure cell at 20,000 psi in 50 mM HEPES pH 8.0, 
1 mM phenylmethylsulfonyl fluoride. The lysate was clarified by 
centrifugation at 15,000 x g for 30 minutes and the supernatant 
loaded onto a QAE-Toyopearl column. The flow-through fractions 

15 were discarded and the fusion protein eluted with 50 mM HEPES pH 
8.0, 100 mM NaCl. The eluate was adjusted to 2M NaCl and loaded 
onto a column of phenyl-Toyopearl. The flow- through fractions 
were again discarded and the fusion protein eluted with 50 mM 
HEPES pH 8.0, 0.5 M NaCl. 

20 The fusion protein was then dialyzed against 25 mM HEPES pH 

8.0 and was >80% pure at this stage. By T1165 bioassay [Paul et 
al, cited above] the purified thioredoxin-ILll protein exhibited 
an activity of SxlO^/mg. This value agrees closely on a molar 
basis with the activity of 2xl0 6 U/mg found for COS cell-derived 

25 IL11 in the same assay. One milligram of the fusion protein was 
cleaved at 37 °C for 20 hours with 1000 units of bovine 
enterokinase [Leipnieks and Light, J. Biol. Chem. . 254: 1677-1683 
(1979)] in 1 ml 10mM Tris-Cl (pH8.0)/10mM CaCl 2 . IL11 could be 
recovered from the reaction products by passing them over a QAE- 

30 Toyopearl column in 25 mM HEPES pH 8.0, where ILll was found in 
the flow-through fractions. Uncleaved fusion protein, 
thioredoxin and enterokinase remained bound on the column. 

The ILll prepared in this manner had a bioactivity in the 
T1165 assay of 2.5x10* U/mg. 


35 
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EXAMPLE 3 - THIO^PDOyiy/Mip-^a FUSION mo^ecui^ 

Human macrophage inflammatory protein la (MlP-la) (SEQ ID 
NO; 16) can be expressed at high levels in E±. coli as a 
thioredoxin fusion protein using an expression vector similar to 
5 pALtrxA/EK/ILllAPro-581 described in Example 1 above but modified 
in the following manner to replace the ribosome binding site of 
bacteriophage T7 with that of XCII. In the plasmid of Example 
1, nucleotides 2222 and 2241 were removed by conventional means. 
Inserted in place of those nucleotides was a sequence of 

10 nucleotides formed by nucleotides 35566 to 35472 and 38137 to 
38361 from bacteriophage lambda as described in Sanger et al 
(1982) cited above. This reference is incorporated by reference 
for the purpose of disclosing this sequence. To express a 
thioredoxin/MIP-la fusion the DNA sequence in the thusly-modif ied 

15 pALtrxA/EK/ILllAPro-581 encoding human IL11 (nucleotides 2599- 
3132) is replaced by the 213 nucleotide DNA sequence (SEQ ID 
NO: 15) shown in Fig. 2 encoding full-length, mature human MlP-la 
[Nakao et al, Mol. Cell, Biol, . 10:3646-3658 (1990)]. 

The host strain and expression protocol used for the 

20 production of thioredoxin/MIP-la fusion protein are as described 
in Example 1. As was seen with the thioredoxin/ IL11 fusion 
protein, all of the thioredoxin/MIP-la fusion protein was found 
in the soluble cellular fraction, representing up to 20% of the 
total protein. Cells were lysed as in Example 1 to give a protein 

25 concentration in the crude lysate of 10 mg/ml. This lysate was 
then heated at 80°C for 10 min to precipitate the majority of 
contaminating £^ coli proteins and was clarified by 
centrifugation at 130,000 x g for 60 minutes. The pellet was 
discarded and the supernatant loaded onto a Mono Q column. The 

30 fusion protein eluted at approximately 0.5 M NaCl from this 
column and was >80% pure at this stage. After dialysis to remove 
salt the fusion protein could be cleaved by an enterokinase 
treatment as described in Example 2 to release MIP-la. 
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EXAMPLE 4 - THIOREDOXIN/BMP2 FUSION MOLECULE 

Human Bone Morphogenetic Protein 2 (BMP-2) can be expressed 
at high levels in 1^ coli as a thioredoxin fusion protein using 
the modified expression vector described in Example 3. The DNA 
5 sequence encoding human IL-11 in the modified 
pALtrxA/EK/ILllAPro-581 (nucleotides 2599-3132) is replaced by 
the 345 nucleotide DNA sequence (SEQ ID NO: 17) shown in Fig. 3 
encoding full-length, mature human BMP-2 [Wozney et al, Science . 
242:1528-1534 (1988)]. 

10 In this case the thioredoxin/BMP-2 fusion protein appeared 

in the insoluble cellular fraction when strain GI724 containing 
the expression vector was grown in medium containing tryptophan 
at 37 °C However, when the temperature of the growth medium was 
lowered to 20 °C the fusion protein was found in the soluble 

15 cellular fraction. 

EXAMPLE 5 - THI OREDOXIN/IL-2 FUSION MOLECULE 

Murine inter leukin 2 (IL-2) is produced at high levels in 
a soluble form in E. coli as a thioredoxin fusion protein using 

20 the modified expression vector described in Example 3. The DNA 
sequence encoding human IL-11 in the modified 
pALtrxA/ EK/ ILllAPro-58 1 vector ( nucleotides 2599-3132) is 
replaced by the DNA sequence encoding murine IL-2, Genbank 
Accession No. K02292, nucleotides 109 to 555. The 

25 thioredoxin/ IL-2 fusion gene is expressed under the conditions 
described for thioredoxin/ IL-11 in Example 2. The culture growth 
temperature used in this case is 15 °C. Under these conditions 
the majority of the thioredoxin/ IL-2 fusion protein accumulates 
in the soluble cellular fraction. The fusion protein can be 

30 cleaved using the enterokinase treatment described in Example 2. 

EXAMPLE 6 - THIOREDOXIN /IL-3 FUSION MOLECULE 

Human inter leukin 3 (IL-3) is produced at high levels in a 
soluble form in JE^ coli as a thioredoxin fusion protein using the 
35 modified expression vector described in Example 3. The DNA 
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sequence encoding human IL-11 in the modified 
pALtrxA/EK/ILllAPro-581 vector (nucleotides 2599-3132 is replaced 
by the DNA sequence encoding human IL-3, Genbank Accession No. 
M14743, nucleotides 67 to 465. The thioredoxin/IL-3 fusion gene 
5 is expressed under the conditions described for thioredoxin/IL-ll 
in Example 2. The culture growth temperature used in this case 
is 15 °C. Under these conditions the majority of the 
thioredoxin/IL-3 fusion protein accumulates in the soluble 
cellular fraction. The fusion protein can be cleaved using the 
10 enterokinase treatment described in Example 2. 

EXAMPLE 7 - THIOREDOXIN/jl^ FUSJQN MOLECULE 

Murine inter leukin 4 (IL-4) is produced at high levels in 
a soluble form in E^_ coli as a thioredoxin fusion using the 

15 modified expression vector described in Example 3. The DNA 
sequence encoding human IL-11 in . the modified 
pALtrxA/EK/ILll£Pro-581 vector (nucleotides 2599-3122 is replaced 
by the DNA sequence encoding murine IL-4, Genbank Accession No. 
M13238, nucleotides 122 to 477. The thioredoxin/ IL-4 fusion gene 

20 is expressed under the conditions described for thioredoxin/IL-ll 
in Example 2. The culture growth temperature used in this case 
is 15 °C. Under these conditions the majority of the 
thioredoxin/ IL-4 fusion protein accumulates in the soluble 
cellular fraction. The fusion protein can be cleaved using the 

25 enterokinase treatment described in Example 2. 


EXAMPLE 8 - THIOREDOXIN/ IL-5 FUSION MOLECULE 

Murine inter leukin 5 (IL-5) is produced at high levels in 

30 a soluble form in JL. coli as a thioredoxin fusion protein using 
the modified expression vector described in Example 3. The DNA 
sequence encoding human IL-11 in the modified 
pALtrxA/EK/ILllAPro-581 vector (nucleotides 2599-3132 is replaced 
by the DNA sequence encoding murine IL-5, Genbank Accession No. 

35 X04601, nucleotides 107 to 443. The thioredoxin/murine IL-5 
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fusion gene is expressed under the conditions described for 
thioredoxin/IL-11 in Example 2. The culture growth temperature 
used in this case is 15 °C. Under these conditions the majority 
of the thioredoxin/murine IL-5 fusion protein accumulates in the 
5 soluble cellular fraction. The fusion protein can be cleaved 
using the enterokinase treatment described in Example 2. 

EXAMPLE 9 - THIOREDOXIN/LIF FUSION MOLECULE 

Murine LIF is produced at high levels in a soluble form in 

10 IL_ coli as a thioredoxin fusion protein using the modified 
expression vector described in Example 3. The DNA sequence 
encoding human IL-11 in the modified pALtrxA/EK/ILllAPro-581 
vector (nucleotides 2599-3132 is replaced by the DNA sequence 
encoding murine LIF, Genbank Accession No. X12810, nucleotides 

15 123 to 734. The thioredoxin/ LIF fusion gene is expressed under 
the conditions described for thioredoxin/IL-11 in Example 2. The 
culture growth temperature used in this case is 25 °C Under 
these conditions the majority of the thioredoxin/LIF fusion 
protein accumulates in the soluble cellular fraction. The fusion 

20 protein can be cleaved using the enterokinase treatment described 
in Example 2. 

EXAMPLE 10 - THIOREDOXIN/STEEL FACTOR FUSION MOLECULE Murine 
Steel Factor is produced at high levels in a soluble form in EL. 

25 coli as a thioredoxin fusion protein using the modified 
expression vector described in Example 3. The DNA sequence 
encoding human IL-11 in the modified pALtrxA/EK/ILllAPro-581 
vector (nucleotides 2599-3132 is replaced by the DNA sequence 
encoding murine Steel Factor, Genbank Accession No. M59915, 

30 nucleotides 91 to 583. The thioredoxin/ Steel Factor fusion gene 
is expressed under the conditions described for thioredoxin/IL-11 
in Example 2. The culture growth temperature used in this case 
is 37 °C. Under these conditions the majority of the 
thioredoxin/ Steel Factor fusion protein accumulates in the 
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soluble cellular fraction. The fusion protein can be cleaved 
using the enterokinase treatment described in Example 2. 


EXAMPLE; 11 - THIOREDOXTN/fllF FUSTOfl MOLECULE 
5 Human Macrophage Inhibitory Factor (MIF) is produced at high 

levels in a soluble form E^ coli as a thioredoxin fusion protein 
using the modified expression vector described in Example 3 . The 
DNA sequence encoding human IL-11 in the modified 
pALtrxA/EK/ILllAPro-581 vector (nucleotides 2599-3132) is 

10 replaced by the DNA sequence encoding human MIF, Genbank 
Accession No. M25639, nucleotides 51 to 397. The thioredoxin/MIF 
fusion gene is expressed under the conditions described for the 
thioredoxin/IL-11 in Example 2. The culture growth temperature 
used in this case is 37 °C. Under these conditions the majority 

15 of the thioredoxin/MIF fusion protein accumulates in the soluble 
cellular fraction. The fusion protein can be cleaved using the 
enterokinase treatment described in Example 2. 

EXAMPLE 12 - THIOREDOXIN/ SMALL PEPTIDE FUSION MOLECULES 
20 Native E^ coli thioredoxin can be expressed at high levels 

in EL coli using strain 61724 containing the same plasmid 
expression vector described in Example 3 deleted for nucleotides 
2569-3129, and employing the growth and induction protocol 
outlined in Example 1. Under these conditions thioredoxin 
25 accumulated to approximately 10% of the total protein, all of it 
in the soluble cellular fraction. 

Fig. 4 illustrates insertion of 13 amino acid residues 
encoding an enterokinase cleavage site into the active site loop 
of thioredoxin, between residues and P 35 of the thioredoxin 
30 protein sequence. The fusion protein containing this internal 
enterokinase site was expressed at levels equivalent to native 
thioredoxin, and was cleaved with an enterokinase treatment as 
outlined in Example 1 above. The fusion protein was fpund to be 
as stable as native thioredoxin to heat treatments, being 
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resistant to a 10 minute incubation at 80 °C as described in 
Example 4. 

Below are listed twelve additional peptide insertions which 
were also made into the active site loop of thioredoxin between 
5 G M and P 35 . The sequences are each 14 amino acid residues in 
length and are random in composition. Each of the thioredoxin 
fusion proteins containing these random insertions were made at 
levels comparable to native thioredoxin. All of them were found 
in the soluble cellular fraction. These peptides include the 
10 following sequences: 

Pro-Leu-Gln-Arg-Ile-Pro-Pro-Gln-Ala-Leu-Arg-Val-Glu-Gly (SEQ ID 
NO:l) , 

Pro-Arg-Asp-Cys-Val-Gln-Arg-Gly-Lys-Ser-Leu-Ser-Leu-Gly (SEQ ID 
NO: 2) , 

15 Pro-Met-Arg-His-Asp-Val-Arg-Cys-Val-Leu-His-Gly-Thr-Gly (SEQ ID 
NO: 3) , 

Pro-Gly-Val-Arg-Leu-Pro-Ile-Cys-Tyr-Asp-Asp-Ile-Arg-Gly (SEQ ID 
NO: 4), 

Pro-Lys-Phe-Ser-Asp-Gly-Ala-Gln-Gly-Leu-Gly-Ala-Val-Gly (SEQ ID 
20 NO: 5), 

Pro-Pro-Ser-Leu-Val-Gln-Asp-Asp-Ser-Phe-Glu-Asp-Arg-Gly (SEQ ID 
NO: 6) , 

Pro-Trp-Ile-Asn-Gly-Ala-Thr-Pro-Val-Lys-Ser-Ser-Ser-Gly (SEQ ID 
NO:7), 

25 Pro-Ala-His-Arg-Phe-Arg-Gly-Gly-Ser-Pro-Ala-Ile-Phe-Gly (SEQ ID 
NO: 8) , 

Pro-Ile-Met-Gly-Ala-Ser-His-Gly-Glu-Arg-Gly-Pro-Glu-Gly (SEQ ID 
NO: 9), 

Pro-Asp-Ser-Leu-Arg-Arg-Arg-Glu-Gly-Phe-Gly-Leu-Leu-Gly (SEQ ID 
30 NO: 10), 

Pro-Ser-Glu-Tyr-Pro-Gly-Leu-Ala-Thr-Gly-His-His-Val-Gly (SEQ ID 
NO: 11), 

and Pro-Leu-Gly-Val-Leu-Gly-Ser-Ile-Trp-Leu-Glu-Arg-Gln-Gly ( SEQ 
ID NO: 12) . 
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The inserted sequences contained examples that were both 
hydrophobic and hydrophilic, and examples that contained cysteine 
residues. It appears that the active-site loop of thioredoxin 
can tolerate a wide variety of peptide insertions resulting in 
5 soluble fusion proteins. Standard procedures can be used to 
purify these loop "inserts". 

EXAMPLE 13 - HUMAN INTERLEUKIN-6 

Human interleukin-6 (IL-6) is be expressed at high levels 

10 in £L_ coli as a thioredoxin fusion protein using an expression 
vector similar to modified pALtrxA/EK/ILilAPro-581 described in 
Example 3 above. To express a thioredoxin-IL6 fusion the DNA 
sequence in modified pALtrxA/EK/ILllAPro-581 encoding human IL-ll 
(nucleotides 2599-3132) is replaced by the 561 nucleotide DNA 

15 sequence (SEQ ID NO; 19) shown in Figure 6 encoding full-length, 
mature human IL-6 [Hirano et al, Nature . 324 ; 73-76 (1986)]. The 
host strain and expression protocol used for the production of 
thioredoxin/ IL-6 fusion protein are as described in Example 1. 

20 When the fusion protein was synthesized at 37°C / 

approximately 50% of it was found in the "inclusion body" or 
insoluble fraction. However all of the thioredoxin-IL6 fusion 
protein, representing up to 10% of the total cellular protein, 
was found in the soluble fraction when the temperature of 

25 synthesis was lowered to 25 °C. 

EXAMPLE 14 - HUMAN MACROPHAGE COLONY STIMULATING FACTOR 

Human Macrophage Colony stimulating Factor (M-CSF) can be 
expressed at high levels in JL. coli as a thioredoxin fusion 
protein using the modified expression vector similar to 
30 pALtrxA/EK/ILllAPro-581 described in Example 3 above. 

The DNA sequence encoding human IL-ll in modified 
pALtrxA/EK/ILllAPro-581 (nucleotides 2599-3135) is replaced by 
the 669 nucleotide DNA sequence shown in Fig. 7 encoding the 
first 223 amino acids of mature human M-CSF0 [G. G. Wong et al, 
35 Science , 235:1504-1508 (1987)]. The host strain and expression 
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protocol used for the production of thioredoxin/M-CSF fusion 
protein was as described in Example 2 above. 

As was seen with the thioredoxin/ IL-11 fusion protein, all 
of the thioredoxin/M-CSF fusion protein was found in the soluble 
5 cellular fraction, representing up to 10% of the total protein. 

EXAMPLE 15 - REL EASE OF FUSION PROTEIN VIA OSMOTIC SHOCK 

To determine whether or not the fusions of heterologous 
proteins to thioredoxin according to this invention enable 
10 targeting to the host cell's adhesion sites and permit the 
release of the fusion proteins from the cell, the cells were 
exposed to simple osmotic shock and freeze/thaw procedures. 

Cells overproducing wild-type JL. coli thioredoxin, human 
thioredoxin, the iU. coli thioredoxin-MIPlo fusion or the Ej. coli 
15 thioredoxin-ILll fusion were used in the following procedures. 

For an osmotic shock treatment, cells were resuspended at 
2 Ajjo/ml in 20 mM Tris-Cl pH 8.0/2.5 mM EDTA/20% w/v sucrose and 
kept cold on ice for 10 minutes. The cells were then pelleted 

20 by centrifugation (12,000 xg, 30 seconds) and gently resuspended 
in the same buffer as above but with sucrose omitted. After an 
additional 10 minute period on ice, to allow for the osmotic 
release of proteins, cells were re-pelleted by centrifugation 
(12,000 xg, 2 minutes) and the supernatant ("shockate") examined 

25 for its protein content. Wild-type E^. coli thioredoxin and human 
thioredoxin were quantitatively released, giving "shockate" 
preparations which were >80% pure thioredoxin. More 
significantly >80% of the thioredoxin-MIPlcr and >50% of the 
thioredoxin-ILll fusion proteins were released by this osmotic 

30 treatment. 

A simple freeze/thaw procedure produced similar results, 
releasing thioredoxin fusion proteins selectively, while leaving 
most of the other cellular proteins inside the cell. A typical 
freeze/thaw procedure entails resuspending cells at 2 Ajso/ml in 

35 20 mM Tris-Cl pH 8.0/2.5 mM EDTA and quickly freezing the 
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suspension in dry ice or liquid nitrogen. The frozen suspension 
is then allowed to slowly thaw before spinning out the cells 
(12,000 xg, 2 minutes) and examining the supernatant for protein. 
Although the resultant "shockate" may require additional 
5 purification, the initial "shockate" is characterized by the 
absence of nucleic acid contaminants. Thus, compared to an 
initial lysate, the purity of the "shockate" is significantly 
better, and does not require the difficult removal of DNA from 
bacterial lysates. Fewer additional steps should be required for 

10 total purity of the "shockate". 

Numerous modifications and variations of the present 
invention are included in the above-identified specification and 
are expected to be obvious to one of skill in the art. Such 
modifications and alterations to the compositions and processes 

15 of the present invention are believed to be encompassed in the 
scope of the claims appended hereto. 
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WHAT IS CLAIMED IS: 

1 . A DNA sequence comprising DNA encoding one or more 
thioredoxin-like proteins fused to a DNA sequence encoding a 
selected, desired peptide or protein , said sequence capable of 
encoding a fusion protein. 

2. The sequence according to claim 1 wherein said 
thioredoxin-like DNA sequence is selected from the group 
consisting of the E. coli thioredoxin [sequence SEQ ID NO: 21], 
human thioredoxin, glutaredoxin, and the thioredoxin-like domains 
of protein disulfide isomerase, form-1 phosphoinositide-specif ic 
phospholipase C and ERp72. 

3. The sequence according to claim 1 wherein said selected 
peptide or protein is selected from the group consisting of IL- 
11, IL-6 [SEQ ID NO:2]0, Macrophage Inhibitory Protein la [SEQ 
ID NO: 16], Bone Morphogenic Protein 2 [SEQ ID NO: 18], IL-2, IL-3, 
IL-4, IL-5, MIF, LIF, Steel Factor and randomly generated peptide 
sequences [SEQ ID NO:l through SEQ ID NO: 12]. 

4. The sequence according to claim 1 further comprising a 
linker peptide between the thioredoxin-like sequence and said 
selected peptide or protein, said linker providing a selected 
cleavage site and preventing steric hindrance between the 
thioredoxin-like molecule and said selected peptide or protein. 

5. A plasmid DNA molecule comprising DNA encoding one or 
more thioredoxin-like proteins fused to the DNA sequence encoding 
a selected, desired peptide or protein, said fusion sequence 
tinder the control of an expression control sequence comprising 
a promoter functional in E. coli, a ribosome binding site, an 
origin of replication and an optional selectable marker, said 
control sequence capable of directing the expression of a fusion 
protein in a selected host cell. 
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6. A host cell transformed with, or having integrated into 
the genome thereof, a DNA molecule comprising a DNA sequence 
encoding at least one thioredoxin-like protein fused to a DNA 
sequence encoding a selected, desired peptide or protein, said 
fusion sequence under the control of an expression control 
sequence capable of directing the expression of a cytoplasmic 
fusion protein. 

7. A fusion protein comprising a thioredoxin-like protein 
fused in frame to a selected, desired peptide or protein. 

8. A method for increasing the expression of a selected 
recombinant protein comprising culturing under suitable 
conditions a host cell transformed with, or having integrated 
into the genome thereof, a DNA molecule comprising a DNA sequence 
encoding at least one thioredoxin-like protein fused to a DNA 
sequence encoding said heterologous protein, said fusion sequence 
tinder the control of an expression control sequence capable of 
directing the expression of a fusion protein; recovering said 
fusion protein from said culture; and optionally cleaving said 
protein from fusion with said thioredoxin-like protein* 

9. The method according to claim 8 wherein said recovering 
step comprises treating said transformed and cultured cells by 
osmotic shock to release said fusion protein from the cell. 

10. The method according to claim 8 wherein said recovering 
step comprises treating said transformed and cultured cells by 
freezing and thawing to release said fusion protein from the 
cell. 
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FIG. 1/7 

pALtrxA/EK/ILllAPro-581 

SEQ ID NO: 13 and SEQ ID NO: 14 


GAC6AAA6G6 

CCTCGTGATA 

CGCCTATTTT 

TATAGGTTAA 

40 

TGTCAT6ATA 

ATAATGGTTT 

CTTAGACGTC 

AGGTGGCACT 

o A 

80 

TTTCGGGGAA 

ATGTGCGCGG 

AACCCCTATT 

TGTTTATTTT 

^ a a 
120 

TCTAAATACA 

TTCAAATATG 

ml HIMMM MfflA 1 

TATCCGCTCA 

fTlM 1 M % M 1 14 ATI Ik 

TGAGACAATA 

i ^a 
160 

ACCCTGATAA ATGCTTCAAT 

i i mi iiiiuM ill 

AATATTGAAA 

% » MM Ik M H MIT1 

AAGGAAGAGT 

AAA 

200 


nnUil X X WwVj 

TGTCGCCCTT 

ATTCCCTTTT 

240 

1 XfelATCWlX 1 

IMI^ ■! 1 II 1 *l 1 1 

X IbUtl X vv>l 

GTTTTTGCTC 

ACCCAGAAAC 

a a r\ 

280 

GCxGGIGAAA 

rimn » » TV TV fTT^* 
G 1 AAAAG A 1 G 

CTGAAGATCA 

GTTGGGTGCA 

320 

C6AGT6GGTT 

iv o tv fir r*r* TV tv /^m 

ACATCGAACT 

GGATCTCAAC 

AGCGGTAAGA 

360 

m /iMnfnM i ,m i m 

TCCTT6AGA6 

TTTTCGCCCC 

GAAGAACGTT 

TTCCAATGAT 

400 

GAGCACTTTT 

AAAGTTCTGC 

TATGTGGCGC 

GGTATTATCC 

440 

CGTATTGACG 

CCGGGCAAGA 

GCAACTCGGT 

CGCCGCATAC 

480 

ACTATTCTCA 

GAATGACTTG 

GTTGAGTACT 

CACCAGTCAC 

520 

' i m iiii m m i m 

AGAAAAGCAT 

CTTACGGATG 

GCATGACAGT 

AAGAGAATTA 

560 

m/*»^nk /rn/"« /">m/*» 

TGCAGTGCTG 

m 7v 7v mmi rn 

CCATAACCAT 

GAGTGATAAC 

ACTGCGGCCA 

600 

ACTTACTTCT 

m i Ml 74 mm 74 mp 

GACAACGATC 

GGAGGACCGA 

AGGAGCTAAC 

640 

CGCTTTTTTG 

Ml Ml 1 M 74 f« MM 

CACAACATGG 

GGGATCATGT 

AACTCGCCTT 

680 

GATCGTTGGG 

1 74 MMM M 1 M /"*TP 

GAATGAAGCC 

ATACCAAACG 

720 

ACGAGCGTGA 

mi mm i mm 74 

CACCACGATG 

CCTGTAGCAA 

TGGCAACAAC 

760 


fl 7k 1 1 k i ■ 74 71 ^T/i 

GCGAACTACT 

TACTCTAGCT 

800 


21 A mf F A AT Af2 A 

CTGGATGGAG 

GCGGATAAAG 

840 

TTGCAGGACC 

ACTTCTGCGC 

TCGGCCCTTC 

CGGCTGGCTG 

AAA 

880 

GTTTATTGCT 

GATAAATCTG 

*m 1 MMMMMHIM1 

GAGCCGGTGA 

M MMmM^ Mm/ Mil 

GCGTGGGTCT 

AAA 

920 

CGCGGTATCA 

TTGCAGCACT 

GGGGCCAGAT 

MMfTll 1 MMMMm 

GGTAAGCCCT 

A £ A 

960 

CCCGTATCGT 

AGTTATCTAC 

» ^«^mi ^*^«^i^v^*ti 
ACGACGGGGA 

MfTlMl MMMTk Ik M 

GTCAGGCAAC 

«• A A A 

1000 

TATGGATGAA 

CGAAATAGAC 

1 *m 1 *V« MM > MUM 1 

AGATCGCTGA 

m i nil M^nM +%^% 

GATAGGTGCC 

*• A A A 

1040 

TCACTGATTA 

AGCATTGGTA 

ACTGTCAGAC 

CAAGTTTACT 

1080 

CATATATACT 

TTAGATTGAT 

TTAAAACTTC 

ATTTTTAATT 

1120 

TAAAAGGATC 

TAGGTGAAGA 

TCCTTTTTGA 

TAATCTCATG 

1160 

ACCAAAATCC 

CTTAACGTGA 

GTTTTCGTTC 

CACTGAGCGT 

1200 

CAGACCCCGT 

AGAAAAGATC 

AAAGGATCTT 

CTTGAGATCC 

1240 

TTTTTTTCTG 

CGCGTAATCT 

GCTGCTTGCA 

AACAAAAAAA 

1280 

CCACCGCTAC 

CAGCGGTGGT 

TTGTTTGCCG 

GATCAAGAGC 

1320 

TACCAACTCT 

TTTTCCGAAG 

GTAACTGGCT 

TCAGCAGAGC 

1360 
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GCAGATACCA AATACT6TCC 

GGCCACCACT TCAAGAACTC 

TCGCTCTGCT AATCCTGTTA 

CGATAAGTCG TGTCTTACCG 

TTACCGGATA AGGCGCAGCG 

CGTGCACACA GCCCAGCTTG 

ACTGAGATAC CTACAGCGTG 

CTTCCCGAAG GGAGAAAGGC 

GCAGGGTCGG AACAGGAGAG 

GGGAAACGCC TGGTATCTTT 

CACCTCTGAC TTGAGCGTCG 

GGGGGCGGAG CCTATGGAAA 

TTTACGGTTC CTGGCCTTTT 

TTCTTTCCTG CGTTATCCCC 

TTACCGCCTT TGAGTGAGCT 

AACGACCGAG CGCAGCGAGT 

GAGCGCCCAA TACGCAAACC 

CGATTCATTA ATGCAGAATT 

ATGCCCCCCT GCAAAAAATA 

AGATAACCAT CTGCGGTGAT 

GACATAAATA CCACTGGCGG 

GGACGCACTG ACCACCATGA 

T ATG AGC GAT AAA ATT 
Met Ser Asp Lys lie 
1 5 


FIG 1A/7 

TTCTAGTGTA GCCGTAGTTA 

TGTAGCACCG CCTACATACC 

CCAGTGGCTG CTGCCAGTGG 

6GTTGGACTC AAGACGATAG 

GTCGGGCTGA ACGGGGGGTT 

GAGCGAACGA CCTACACCGA 

AGCATTGAGA AAGCGCCACG 

GGACAGGTAT CCGGTAAGCG 

CGCACGAGGG AGCTTCCAGG 

ATAGTCCTGT CGGGTTTCGC 

ATTTTTGTGA TGCTCGTCAG 

AACGCCAGCA ACGCGGCCTT 

6CTGGCCTTT TGCTCACATG 

TGATTCTGTG GATAACCGTA 

GATACCGCTC GCCGCAGCCG 

CAGTGAGCGA GGAAGCGGAA 

GCCTCTCCCC GCGCGTTGGC 

GATCTCTCAC CTACCAAACA 

AATTCATATA AAAAACATAC 

AAATTATCTC TGGCGGTGTT 

TGATACTGAG CACATCAGCA 

ATTCAAGAAG GAGATATACA 

ATT CAC CTG ACT GAC GAC 
lie His Leu Thr Asp Asp 

10 


AGT TTT GAC ACG GAT GTA CTC AAA GCG GAC GGG 
Ser Phe Asp Thr Asp Val Leu Lys Ala Asp Gly 
15 20 

GCG ATC CTC GTC GAT TTC TGG GCA GAG TGG TGC 
Ala lie Leu Val Asp Phe Trp Ala Glu Tip Cys 
25 30 

GGT CCG TGC AAA ATG ATC GCC CCG ATT CTG GAT 
Gly Pro Cys Lys Met lie Ala Pro lie Leu Asp 
35 40 

GAA ATC GCT GAC GAA TAT CAG GGC AAA CTG ACC 
Glu He Ala Asp Glu Tyr Gin Gly Lys Leu Thr 
45 50 55 


1400 
1440 
1480 
1520 
1560 
1600 
1640 
1680 
1720 
1760 
1800 
1840 
1880 
1920 
1960 
2000 
2040 
2080 
2120 
2160 
2200 
2240 
2274 


2307 


2340 


2373 


2406 


WO 94/02502 


3/12 


PCI7US93/06913 


FIG. IB/ 7 

GTT GCA AAA CTG AAC ATC GAT CAA AAC CCT GGC 2439 
Val Ala Lys Leu Asn He Asp Gin Asn Pro Gly 

60 65 

ACT GCG CCG AAA TAT GGC ATC CGT GGT ATC CCG 2472 
Thr Ala Pro Lys Tyr Gly He Arg Gly He Pro 
70 75 

ACT CTG CTG CTG TTC AAA AAC GGT GAA GTG GCG 2505 
Thr Leu Leu Leu Phe Lys Asn Gly Glu Val Ala 
80 85 

GCA ACC AAA GTG GGT GCA CTG TCT AAA GGT CAG 2538 
Ala Thr Lys Val Gly Ala Leu Ser Lys Gly Gin 
90 95 

TTG AAA GAG TTC CTC GAC GCT AAC CTG GCC GGT 2571 
Leu Lys Glu Phe Leu. Asp Ala Asn Leu Ala Gly 
100 105 110 


TCT GGT TCT GGT GAT GAC GAT GAC AAA GGT CCA 2604 
Ser Gly Ser Gly Asp Asp Asp Asp Lys Gly Pro 

115 120 

CCA CCA GGT CCA CCT CGA GTT TCC CCA GAC CCT 2637 
Pro Pro Gly Pro Pro Arg Val Ser Pro Asp Pro 
125 130 

CGG GCC GAG CTG GAC AGC ACC GTG CTC CTG ACC 2670 
Arg Ala Glu Leu Asp Ser Thr Val Leu Leu Thr 
135 140 

CGC TCT CTC CTG GCG GAC ACG CGG CAG CTG GCT 2703 
Arg Ser Leu Leu Ala Asp Thr Arg Gin Leu Ala 
145 150 

GCA CAG CTG AGG GAC AAA TTC CCA GCT GAC GGG 2736 
Ala Gin Leu Arg Asp Lys Phe Pro Ala Asp Gly 
155 160 165 

GAC CAC AAC CTG GAT TCC CTG CCC ACC CTG GCC 2769 
Asp His Asn Leu Asp Ser Leu Pro Thr Leu Ala 

170 175 

ATG AGT GCG GGG GCA CTG GGA GCT CTA CAG CTC 2802 
Met Ser Ala Gly Ala Leu Gly Ala Leu Gin Leu 
180 185 

CCA GGT GTG CTG ACA AGG CTG CGA GCG GAC CTA 2835 
Pro Gly Val Leu Thr Arg Leu Arg Ala Asp Leu 
190 195 
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FIG. 1C/7 

CTG TCC TAC CTG CGG CAC GTG CAG TGG CTG CGC 
Leu Ser Tyr Leu Arg His Val Gin Trp Leu Arg 
200 205 


2868 


CGG GCA GGT GGC TCT TCC CTG AAG ACC CTG GAG 
Arg Ala Gly Gly Ser Ser Leu Lys Thr Leu Glu 
210 215 220 


2901 


CCC GAG CTG GGC ACC CTG CAG GCC CGA CTG GAC 
Pro Glu Leu Gly Thr Leu Gin Ala Arg Leu Asp 

225 230 


2934 


CGG CTG CTG CGC CGG CTG CAG CTC CTG ATG TCC 
Arg Leu Leu Arg Arg Leu Gin Leu Leu Met Ser 
235 240 


2967 


CGC CTG GCC CTG CCC CAG CCA CCC CCG GAC CCG 
Leu Ala Leu Pro Gin Pro Pro Pro Asp Pro 
245 250 


3000 


Arg 


CCG GCG CCC CCG CTG GCG CCC CCC TCC TCA GCC 
Pro Ala Pro Pro Leu Ala Pro Pro Ser Ser Ala 
255 260 


3033 


TGG GGG GGC ATC AGG GCC GCC CAC GCC ATC CTG 
Trp Gly Gly lie Arg Ala Ala His Ala lie Leu 
265 270 275 


3066 


GGG GGG CTG CAC CTG ACA CTT GAC TGG GCC GTG 
Gly GLy Leu His Leu Thr Leu Asp Trp Ala Val 

280 285 


3099 


AGG GGA CTG CTG CTG CTG AAG ACT CGG CTG TGA 
Arg Gly Leu Leu Leu Leu Lys Thr Arg Leu 
290 295 


3132 


AAGCTTATCG 

ATACCGTCGA 

CCTGCAGTAA 

TCGTACAGGG 

3172 

TAGTACAAAT 

AAAAAAGGCA 

CGTCAGATGA 

CGTGCCTTTT 

3212 

TTCTTGTGAG 

CAGTAAGCTT 

GGCACTGGCC 

GTCGTTTTAC 

3252 

AACGTCGTGA 

CTGGGAAAAC 

CCTGGCGTTA 

CCCAACTTAA 

3292 

TCGCCTTGCA 

GCACATCCCC 

CTTTCGCCAG 

CTGGCGTAAT 

3332 

AGCGAAGAGG 

CCCGCACCGA 

TCGCCCTTCC 

CAACAGTTGC 

3372 

GCAGCCTGAA 

TGGCGAATGG 

CGCCTGATGC 

GGTATTTTCT 

3412 

CCTTACGCAT 

CTGTGCGGTA 

TTTCACACCG 

CATATATGGT 

3452 
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FIG. ID/7 

GCACTCTCAG TACAATCTGC TCTGATGCCG CATAGTTAAG 3492 

CCAGCCCCGA CACCCGCCAA CACCCGCTGA CGCGCCCTGA 3532 

CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC AGACAAGCTG 3572 

TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC 3612 

6TCATCACCG AAACGCGCGA 3632 
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FIG. 2/7 

MlP-la 

SEQ ID NO: 15 and SEQ ID NO: 16 


GCA CCA CTT GCT GCT GAC ACG CCG ACC GCC TGC TGC 
Ala Pro Leu Ala Ala Asp Thr Pro Thr Ala Cys Cys 
1 5 10 


36 


TTC AGC TAC ACC TCC CGA CAG ATT CCA CAG AAT TTC 72 
Phe Ser Tyr Thr Ser Arg Gin lie Pro Gin Asn Phe 
15 20 

ATA GCT GAC TAC TTT GAG ACG AGC AGC CAG TGC TCC 109 
lie Ala Asp Tyr Phe Glu Thr Ser Ser Gin Cys Ser 
25 ' 30 35 

AAG CCC AGT GTC ATC TTC CTA ACC AAG AGA GGC CGG 145 
Lys Pro Ser Val lie Phe Leu Thr Lys Arg Gly Arg 
40 45 

CAG GTC TGT GCT GAC CCC AGT GAG GAG TGG GTC CAG 181 
Gin Val Cys Ala Asp Pro Ser Glu Glu Trp Val Gin 
50 55 60 

AAA TAC GTC AGT GAC CTG GAG CTG AGT GCC TAA 214 
Lys Thr Val Ser Asp Leu Glu Leu Ser Ala 

65 70 
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FIG. 3/7 

BMP- 2 

SEQ ID NO: 17 and SEQ ID NO: 18 


CAA 6CT AAA CAT AAA CAA C6T AAA C6T CTG AAA TCT 36 
Gin Ala Lys His Lys Gin Arg Lys Arg Leu Lys Ser 
15 10 

AGC TGT AAG AGA CAC CCT TTG TAC GTG GAC TTC AGT 72 
Ser Cys Lys Arg His Pro Leu Tyr Val Asp Phe Ser 
15 20 

GAC GTG GGG TGG AAT GAC TGG ATT GTG GCT CCC CCG 109 
Asp Val Gly Trp Asn Asp Trp lie Val Ala Pro Pro 
25 30 35 

GGG TAT CAC GCC TTT TAC TGC CAC GGA GAA TGC CCT 145 
Gly Tyr His Ala Phe Tyr Cys His Gly Glu Cys Pro 
40 45 

TTT CCT CTG GCT GAT CAT CTG AAC TCC ACT AAT CAT 181 
Phe Pro Leu Ala Asp His Leu Asn Ser Thr Asn His " 
50 55 60 

GCC ATT GTT CAG ACG TTG GTC AAC TCT GTT AAC TCT 217 
Ala lie Val Gin Thr Leu Val Asn Ser Val Asn Ser 

65 70 

AAG ATT CCT AAG GGA TGC TGT GTC CCG ACA GAA CTC 253 
Lys lie Pro Lys Ala Cys Cys Val Pro Thr Glu Leu 
75 80 

AGT GCT ATC TCG ATG CTG TAC CTT GAC GAG AAT GAA 289 
Ser Ala lie Ser Met Leu Tyr Leu Asp Glu Asn Glu 
85 90 95 

AAG GTT GTA TTA AAG AAC TAT CAG GAC ATG GTT GTG 325 
Lys Val Val Leu Lys Asn Tyr Gin Asp Net Val Val 
100 105 

GAG GGT TGT GGG TGT CGC TAG 346 
Glu Gly Cys Gly Cys Arg 
110 
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FIG. 4/7 


INSERTION OF AN ENTEROKINASE SITE INTO 
THE ACTIVE-SITE LOOP OF E.COLI THIOREDOXIN (trxA) 


trxA active 
site loop 


RsrII 
j 

. . . GAGTGGTGCGGTCCGTGCAAAATG . 

. . . CTCACCACGCCAGGCACGTTTTAC . 

...E W C G P C K M . 
31 38 


RsrII cut 


> GAGTGGTGCG 
, CTCACCACGCCAG 

.E W C G 
31 


GTCCGTGCAAAATG. . . . 
GCACGTTTTAC. . . . 
P C K M .... 


38 


Enterokinase site 
(13 residues) 

gtcactccGACTACAAAGACGACGACGACAAAgcttctg 

tgaggCTGATGTTTCTGCTGCTGCTGTTTcgaagaccag 

....H S D Y K D D D D K A S G... 


cleavage site 
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FIG. 5/7 


RANDOM PEPTIDE INSERTIONS INTO THE ACTIVE-SITE 
LOOP OF E.COLI THIOREDOXIN (trxA) 


trxA active 
site loop 


RsrII 
! 

. 6AGTGGT6C6GTCCGT6CAAAAT6 . 

. CTCACCACGCCAGGCACGTTTTAC . 

.EWCGPCKM. 
31 38 


RsrII cut 


. GAGTGGTGCG 

. CTCACCACGCCAG 

.E W C G 
31 


GTCCGTGCAAAATG. 

GCACGTTTTAC. 

P C K M 
38 


(Avail) Avail 
5' | | 3' 

GACTGACTGGTCCG . . . (N*) . . . GGTCCTCAGTCAGTCAG 

oligos 

CCAGGAGTCAGTCAGTC 
3' 5' 

random GTCCG. . . (N M ) . . .G 

duplex — — — — — 

GC...(N 36 )...CCAG 

insertion into trxA active site loop 

GAGTGGTGCGGTCCG . . . (N M ) . . . GGTCCGTGCAAAATG 

CTCACCACGCCAGGC. . . (N 36 ) . . . CCAGGCACGTTTTAC 

E W C G P . . (X u ) . . G P C K M 

31 38 
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FIG. 6/7 

IL6 

SEQ ID NO: 19 and SEQ ID NO: 20 

5 10 
ATG GCT CCA GTA CCT CCA GGT GAA GAT TCT AAA GAT GTA 39 
Met Ala Pro Val Pro Pro Gly Glu Asp Ser Lys Asp Val 


15 20 25 

GCC GCC CCA CAC AGA CAG CCA CTC ACC TCT TCA GAA CGA 78 
Ala Ala Pro His Arg Gin Pro Leu Thr Ser Ser Glu Arg 


30 35 
ATT GAC AAA CAA ATT CGG TAC ATC CTC GAC GGC ATC TCA 
lie Asp Lys Gin lie Arg Tyr He Leu Asp Gly He Ser 


117 


40 45 50 

GCC CTG AGA AAG GAG ACA TGT AAC AAG AGT AAC ATG TGT 
Ala Leu Arg Lys Glu Thr Cys Asn Lys Ser Asn Met Cys 


156 


55 60 
GAA AGC AGC AAA GAG GCA CTG GCA GAA AAC AAC CTG AAC 
Glu Ser Ser Lys Glu Ala Leu Ala Glu Asn Asn Leu Asn 


195 


65 70 75 

CTT CCA AAG ATG GCT GAA AAA GAT GGA TGC TTC CAA TCT 
Leu Pro Lys Met Ala Glu Lys Asp Gly Cys Phe Gin Ser 


234 


80 85 90 

GGA TTC AAT GAG GAG ACT TGC CTG GTG AAA ATC ATC ACT 
Gly Phe Asn Glu Glu Thr Cys Leu Val Lys He He Thr 


273 


95 100 
GGT CTT TTG GAG TTT GAG GTA TAC CTA GAG TAC CTC CAG 
Gly Leu Leu Glu Phe Glu Val Tyr Leu Glu Thr Leu Gin 


312 


105 110 115 

AAC AGA TTT GAG AGT AGT GAG GAA CAA GCC AGA GCT GTG 
Asn Arg Phe Glu Ser Ser Glu Glu Gin Ala Arg Ala Val 


351 
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FIG. 6A/7 

120 125 
CAG ATG AGT ACA AAA GTC CTG ATC CAG TTC CTG CAG AAA 
Gin Met Ser Thr Lys Val Leu lie Gin Phe Leu Gin Lys 


390 


130 140 150 

AAG GCA AAG AAT CTA GAT GCA ATA ACC ACC CCT GAC CCA 
Lys Ala Lys Asn Leu Asp Ala lie Thr Thr Pro Asp Pro 


429 


155 160 
ACC ACA AAT GCC AGC CTG CTG ACG AAG CTG CAG GCA CAG 468 
Thr Thr Asn Ala Ser Leu Leu Thr Lys Leu Gin Ala Gin 

170 175 180 

AAC CAG TGG CTG CAG GAC ATG ACA ACT CAT CTC ATT CTG 507 
Asn Gin Trp Leu Gin Asp Met Thr Thr His Leu lie Leu 


185 190 
CGC AGC TTT AAG GAG TTC CTG CAG TCC AGC CTG AGG GCT 
Arg Ser Phe Lys Glu Phe Leu Gin Ser Ser Leu Arg Ala 


546 


195 

CTT CGG CAA ATG TAG 
Leu Arg Gin Met * 


561 
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FIG. 7 Of 7 
(SEQ ID NO: 23) 
(SEQ ID NO: 24) 


1 

GAAGAAGTTT 

CTGAATATTG 

TAGCCACATG 

ATTGGGAGTG 

GACACCTGCA 

51 

GTCTCTGCAG 

CGGCTGATTG 

ACAGTCAGAT 

GGAGACCTCG 

TGCCAAATTA 

101 

CATTTGAGTT 

TGTAGACCAG 

GAACAGTTGA 

AAGATCCAGT 

GTGCTACCTT 

151 

AAGAAGGCAT 

TTCTCCTGGT 

ACAAGACATA 

ATGGAGGACA 

CCATGCGCTT 

201 

CAGAGATAAC 

ACCCCCAATG 

CCATCGCCAT 

TGTGCAGCTG 

CAGGAACTCT 

251 

CTTTGAGGCT 

GAAGAGCTGC 

TTCACCAAGG 

ATTATGAAGA 

GCATGACAAG 

301 

GCCTGCGTCC 

GAACTTTCTA 

TGAGACACCT 

CTCCAGTTGC 

TGGAGAAGGT 

351 

CAAGAATGTC 

TTTAATGAAA 

CAAAGAATCT 

CCTTGACAAG 

GACTGGAATA 

401 

TTTTCAGCAA 

GAACTGCAAC 

AACAGCTTTG 

CTGAATGCTC 

CAGCCAAGAT 

451 

GTGGTGACCA 

AGCCTGATTG 

CAACTGCCTG 

TACCCCAAAG 

CCATCCCTAG 

501 

CAGTGACCCG 

GCCTCTGTCT 

CCCCTCATCA 

GCCCCTCGCC 

CCCTCCATGG 

551 

CCCCTGTGGC 

TGGCTTGACC 

TGGGAGGACT 

CTGAGGGAAC 

TGAGGGCAGC 

601 

TCCCTCTTGC 

CTGGTGAGCA 

GCCCCTGCAC 

ACAGTGGATC 

CAGGCAGTGC 

651 

CAAGCAGCGG 

CCACCCAGG 
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